ggplot2 package, which is part of the tidyverse.ggplot2 graphics look better “out of the box”, and the syntax follows the tidyverse philosopy.DataWorkshop project. If not, switch to it.tidyplots.Rmd in the project directory DataWorkshop. After you save the file, it should appear in the Files tab, along with the file DataWorkshop.Rproj.library(tidyverse)## ── Attaching packages ──────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 2.0.1 ✔ dplyr 0.8.0.1
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.3.1 ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
mpg tibbleThe tidyverse comes with a built-in tibble called mpg. To make it appear in the upper-right pane, type View(mpg) in the console. You can also just echo the name:
mpg## # A tibble: 234 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 q… 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 q… 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 q… 2 2008 4 manu… 4 20 28 p comp…
## # … with 224 more rows
Use the tidyverse data transformations to create a list of all the auto manufacturers, along with the average city mpg of their vehicles, sorted from most fuel-efficient to least.
Most of this lesson is adapted from Chapter 3 of R for Data Science.
ggplot() creates a blank graph.ggplot(mpg) associates the mpg data.ggplot(mpg) + geom_point() adds a scatterplot to the graph, but we need to specify what variables to use.aes() specifies which variables are represented by different properties of the graph (aesthetics).geom’s. You can add layers to the plot with +’s.ggplot(mpg) + geom_point(aes(x=displ, y=hwy))ggplot(mpg) + geom_point(aes(x=displ, y=hwy, color = class))ggplot(mpg) + geom_point(aes(x=displ, y=hwy, color = class, size=cyl))ggplot(mpg) + geom_point(aes(x=displ, y=hwy), color = "blue")ggplot(mpg) + geom_point(aes(x=displ, y=hwy, color = "blue"))Map a continuous variable to color, size, and shape.
Map the same variable to multiple aesthetics.
What does the stroke aesthetic do? What shapes does it work with? (Hint: use vignette("ggplot2-specs").)
What happens if you map an aesthetic to something other than a variable name, like aes(color = displ < 5)? Note, you’ll also need to specify x and y.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)\(\Rightarrow\) In the lower-right pane, click on the Packages tab, then find the link for ggplot2. Clicking this link should bring up the package help pages; scroll down to the “G” section and observe all of the available geometries.
geom_pointggplot(mpg) +
geom_point(aes(x = displ, y = hwy))geom_point with geom_smoothggplot(mpg) +
geom_smooth(aes(x = displ, y = hwy))ggplot(mpg) +
geom_point(aes(x = displ, y = hwy)) +
geom_smooth(aes(x = displ, y = hwy))methodggplot(mpg) +
geom_point(aes(x = displ, y = hwy)) +
geom_smooth(aes(x = displ, y = hwy), method = "lm")linetype aestheticggplot(mpg) +
geom_smooth(aes(x = displ, y = hwy, linetype = drv))Read ?facet_wrap. What does nrow do? What does ncol do? Why doesn’t facet_grid() have these arguments?
Investigate the ggplot2 geometry functions (click on the ggplot2 link in the packages tab). What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?
diamonds tibbleTry View(diamonds) in the console to see the contents of this built-in tidyverse data set.
diamonds## # A tibble: 53,940 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
## # … with 53,930 more rows
ggplot(diamonds) +
geom_bar(aes(x = cut))ggplot(diamonds) +
geom_bar(aes(x = cut, fill=clarity))ggplot(diamonds) +
geom_bar(aes(x = cut, fill=clarity), position="fill")ggplot(diamonds) +
geom_bar(aes(x = cut, fill=clarity), position="dodge")ggplot(diamonds) +
geom_boxplot(aes(x = cut, y = price))ggplot(diamonds) +
geom_boxplot(aes(x = cut, y = price))does the same thing as:
diamonds %>% ggplot() +
geom_boxplot(aes(x = cut, y = price))diamonds %>% filter(price<7500) %>% ggplot() +
geom_boxplot(aes(x = cut, y = price))ggplot(diamonds) +
geom_density(aes(x = price, fill=cut))ggplot(diamonds) +
geom_density(aes(x = price, fill=cut), alpha=0.3)Create a density plot for diamonds priced between 2500 and 7500, grouped by cut. Then make a histogram of the same thing.
Try appending + coord_flip() to the end of a ggplot sum.
Try appending + coord_polar() to the end of a ggplot sum for some of the above boxplots. Can you figure out how to make a traditional pie chart? (Not that you ever should.)
Try appending + theme_classic(). Investigate other ggplot2 themes.
Using the mpg data, make an appropriate chart showing the average city mpg of each manufacturer, in order.